-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix __ne__ comparison for Categorical #32304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/arrays/categorical.py
Outdated
@@ -103,7 +103,10 @@ def func(self, other): | |||
mask = (self._codes == -1) | (other_codes == -1) | |||
if mask.any(): | |||
# In other series, the leads to False, so do that here too | |||
ret[mask] = False | |||
if opname == "__ne__": | |||
ret[mask & (self._codes == other_codes)] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This redundancy was intentional and an attempt to "reuse" the mask
calculation above since I think it may short-circuit when mask
is False
, but could just make it (self._codes == -1) & (other_codes == -1)
as well. I also think __ne__
is the only situation that'd have to be special-cased here out of the comparison operators. Also worth pointing out that this is assuming we're dealing with NaN
rather than NA
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self._codes == -1) & (other_codes == -1)
I think this would be more clear. Also there might be a cached _isnan
attribute for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsaxton im working on cleaning up these comparisons, am confused by this line. can we make this use the more common pattern
fill_value = True if op is operator.ne else False
[...]
ret = op(self._codes, other_codes)
mask = (self._codes == -1) | (other_codes == -1)
ret[mask] = fill_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel That is much simpler and seems correct to me
@@ -282,6 +282,16 @@ def _compare_other(self, s, data, op_name, other): | |||
with pytest.raises(TypeError, match=msg): | |||
op(data, other) | |||
|
|||
def test_not_equal_with_na(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we get this right in cases with e.g. datetime64 or datetime64tz categories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to work, added some parameterization over other category types
thanks @dsaxton |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff